PaCCSS-IT: A Parallel Corpus of Complex-Simple Sentences for Automatic Text Simplification

نویسندگان

  • Dominique Brunato
  • Andrea Cimino
  • Felice Dell'Orletta
  • Giulia Venturi
چکیده

In this paper we present PaCCSS–IT, a Parallel Corpus of Complex–Simple Sentences for ITalian. To build the resource we develop a new method for automatically acquiring a corpus of complex–simple paired sentences able to intercept structural transformations and particularly suitable for text simplification. The method requires a wide amount of texts that can be easily extracted from the web making it suitable also for less–resourced languages. We test it on the Italian language making available the biggest Italian corpus for automatic text simplification.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus-based Sentence Deletion and Split Decisions for Spanish Text Simplification

This study addresses the automatic simplification of texts in Spanish in order to make them more accessible to people with cognitive disabilities. A corpus analysis of original and manually simplified news articles was undertaken in order to identify and quantify relevant operations to be implemented in a text simplification system. The articles were further compared at sentence and text level ...

متن کامل

Automatic Simplification of Spanish Text for e-Accessibility

In this pa per we present an automatic text simplification system for Spanish which intends to make texts more accessible for users with cognitive disabilities. This system aims at reducing the structural complexity of Spanish sentences in that it converts complex sentences in two or more simple sentences and therefore reduces reading difficulty.

متن کامل

A Tagging Approach to Identify Complex Constituents for Text Simplification

The occurrence of syntactic phenomena such as coordination and subordination is characteristic of long, complex sentences. Text simplification systems need to detect and categorise constituents in order to generate simpler sentences. These constituents are typically bounded or linked by signs of syntactic complexity, which include conjunctions, complementisers, whwords, and punctuation marks. T...

متن کامل

An Open Corpus of Everyday Documents for Simplification Tasks

In recent years interest in creating statistical automated text simplification systems has increased. Many of these systems have used parallel corpora of articles taken from Wikipedia and Simple Wikipedia or from Simple Wikipedia revision histories and generate Simple Wikipedia articles. In this work we motivate the need to construct a large, accessible corpus of everyday documents along with t...

متن کامل

Learning When to Simplify Sentences for Natural Text Simplification

This paper introduces a corpus-based approach for selecting sentences that require simplification in the context of Brazilian Portuguese text simplification system. Based on a parallel corpus of original and simplified text versions, we apply a binary classifier to decide in which circumstances a sentence should or not be split – which is the most important syntactic simplification operation – ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016